Advanced Analytics and Real-Time Data Processing in Apache Spark
Accredited by CPD & iAP | FREE PDF Certificate Included | Unlimited Access for 365 Days | Quality Study Materials
Global Edulink
Summary
- Exam(s) / assessment(s) is included in price
- TOTUM card available but not included in price What's this?
Add to basket or enquire
Overview
Advanced Analytics and Real-Time Data Processing in Apache Spark
Apache Spark is a unified analytics engine that is used in processing and analysing big data. it has started to gain recognition within large organisations for its speed, ease of use, standard interface and real-time data processing features, and could pose a great advantage in getting into the data analysis or data science field of work. If you came here wanting to learn advanced analytics and real time processing in Apache Spark, you are heading the right way with this course, as this course is set to educate you on all the aspects of Apache Spark to start processing and analysing big data.
This professionally narrated course will start off by diving into the architecture and components of Spark streaming to educate you on how it can be used in generating final data batches. You will then move onto explore the use cases of spark streaming application to use it appropriately with the suitable engine, along with an insight into the spark streaming word count problem and spark streaming API. The stressful task of managing events that are not in order while building streaming applications will also be given due attention through this course.
Out highly talented tutors will then guide you on how to create a project using the Spark’s MLlib library to provide you a more hand-on experience with the framework. You will then move onto explore the components and operations of Spark GraphX to create graphs using it for analysis purposes, followed by a chapter on SparkR and its role in distributed data frame implementation. To top it all off, you will also be taught on how to send real-time notifications when a user wants to buy a product from an e-commerce site. By the completion of this course, you will have a great grip on Apache Spark to make use of its advanced analytics and real-time data processing aspects in your career.
Why study at Global Edulink?
Global Edulink offers the most convenient path to gain recognised skills and training that will give you the opportunity to put into practice your knowledge and expertise in an IT or corporate environment. You can study at your own pace at Global Edulink and you will be provided with all the necessary material, tutorials, qualified course instructor, narrated e-learning modules and free resources which include Free CV writing pack, free career support and course demo to make your learning experience more enriching and rewarding.
CPD
Course media
Description
COURSE CURRICULM
Module 01 : Spark Streaming
- The Course Overview
- Introducing Spark Streaming
- Streaming Context
- Processing Streaming Data
- Use Cases
- Spark Streaming Word Count Hands-On
- Spark Streaming – Understanding Master URL
- Integrating Spark Streaming with Apache Kafka
- mapWithState Operation
- Transform and Window Operation
- Join and Output Operations
- Output Operations -Saving Results to Kafka Sink
Module 02 : Advance Streaming and Use Cases
- Handling Time in High Velocity Streams
- Connecting External Systems That Works in At Least Once Guarantee – Deduplicaion
- Building Streaming Application -Handling Events That Are Not in Order
- Filtering Bots from Stream of Page View Events
Module 03 : Spark MLlib and ML Pipelines
- Introducing Machine Learning with Spark
- Feature Extraction and Transformation
- Transforming Text into Vector of Numbers – ML Bag-of-Words Technique
- Logistic Regression
- Model Evaluation
- Clustering
- Gaussian Mixture Model
- Principal Component Analysis and Distributing the Singular Value Decomposition (SVD)
- Collaborative Filtering – Building Recommendation Engine
Module 04 : Spark GraphX
- Introducing Spark GraphX – How to Represent a Graph?
- Limitations of Graph-Parallel System – Why Spark GraphX?
- Importing GraphX
- Create a Graph Using GraphX and Property Graph
- List of Operators
- Perform Graph Operations Using GraphX
- Triplet View
Module 05 : Performing Spark GraphX Operations
- Perform Subgraph Operations
- Neighbourhood Aggregations – Collecting Neighbours
- Counting Degree of Vertex
- Caching and Uncaching
- GraphBuilder
- Vertex and Edge RDD
- Structural Operators – Connected Components
Module 06 : SparkR
- Introduction to SparkR and How It’s Used?
- Setting Up from RStudio
- Creating Spark DataFrames from Data Sources
- SparkDataFrames Operations – Grouping, Aggregation
- Run a Given Function on a Large Dataset Using dapply or dapplyCollect
- Running Large Dataset by Input Column(s) and Using gapply or gapplyCollect
- Run Local R Functions Distributed Using spark.lapply
- Running SQL Queries from SparkR
Module 07 : Analytical Use Cases
- PageRank Using Spark GraphX
- Sending Real-Time NotificationWhen User Want to Buy a Product on the E-Commerce Site
Access Duration
The course will be directly delivered to you, and you have 12 months access to the online learning platform from the date you joined the course. The course is self-paced and you can complete it in stages, revisiting the lectures at any time.
Method Of Assessment
The course is assessed online with a final, multiple-choice test, which is marked automatically. You will know instantly whether you have passed the course.
Certification
Those who pass this test will get a certificate in Advanced Analytics and Real-Time Data Processing with Apache Spark
Other benefits
- High-quality e-learning study materials and mock exams.
- Tutorials/materials from the industry leading experts.
- 24/7 Access to the Learning Portal.
- The benefit of applying for TOTUM extra Discount Card.
- Recognised Accredited Qualification.
- Excellent customer service and administrative support
Who is this course for?
This course might interest individuals looking to master advanced analytics and real-time data processing to get into or progress within the data analysis or data science field of work
Requirements
- Learners must be age 16 or over and should have basic understanding of the English Language, numeracy, literacy and ICT.
- A basic knowledge of the spark programming, Apache Spark and real-time data processing is required to follow up on this course
Career path
Listed below are few of the jobs this certificate will benefit you in, along with the average UK salary per annum.
- Data analyst – £25,972 per annum
- Data scientist – £35,226 per annum
- Data Manager – £29,986 per annum
- Data analysis manager – £37,349 per annum
- Data engineer – £41,223 per annum
Questions and answers
Currently there are no Q&As for this course. Be the first to ask a question.
Reviews
Currently there are no reviews for this course. Be the first to leave a review.
Legal information
This course is advertised on reed.co.uk by the Course Provider, whose terms and conditions apply. Purchases are made directly from the Course Provider, and as such, content and materials are supplied by the Course Provider directly. Reed is acting as agent and not reseller in relation to this course. Reed's only responsibility is to facilitate your payment for the course. It is your responsibility to review and agree to the Course Provider's terms and conditions and satisfy yourself as to the suitability of the course you intend to purchase. Reed will not have any responsibility for the content of the course and/or associated materials.